Personalized pronunciation hints based on user speech

ABSTRACT

In an approach to analyzing a sound file, determining the language of the sound file and the display, creating a pronunciation map between the languages, generating a set of pronunciation hints based on the pronunciation map, and displaying the set of pronunciation hints, one or more computer processors identify a word from one or more words in a sound file. The one or more computer processors determine a dialect of spoken language for the word. The one or more computer processors determine a different language to display the word. The one or more computer processors retrieve one or more phonological rules based on the determined spoken language of the word and the determined different language to display the word. The one or more computer processors create a pronunciation map based on the retrieved phonological rules of the word.

BACKGROUND

The present invention relates generally to the field of languageprocessing, and more particularly to creating sets of pronunciation mapsbased on the language of a sound file and a display and generatingpronunciation hints.

A phoneme is one of the units of sound (or gesture in the case of signlanguages) that distinguish one word from another in a particularlanguage. For example, in most dialects of English, the sound patterns“/θΛm/” (thumb) and “/dΛm/” (dumb) are two separate words distinguishedby the substitution of one phoneme, “/θ/”, for another phoneme, “/d/”.Two words that differ in meaning through a contrast of a single phonemeform what is called a minimal pair. In many other languages these wouldbe interpreted as exactly the same set of phonemes (i.e. “/θ/” and “d/”would be considered the same). In linguistics, phonemes (usuallyestablished by the use of minimal pairs, such as pat vs bat) are writtenbetween slashes, e.g. “/p/”. To show pronunciation more preciselylinguists use square brackets, for example “[p^(h)]” (indicating anaspirated p).

Phonemes are generally regarded as an abstraction of a set (orequivalence class) of speech sounds (phones) which are perceived asequivalent to each other in a given language. For example, in English,the k sounds in the words kit and skill are not identical (as describedbelow), but they are distributional variants of a single phoneme “/k/”.Different speech sounds that are realizations of the same phoneme areknown as allophones. Allophonic variation may be conditioned, in whichcase a certain phoneme is realized as a certain allophone in particularphonological environments, or it may be free in which case it may varyrandomly. In this way, phonemes are often considered to constitute anabstract underlying representation for segments of words, while speechsounds make up the corresponding phonetic realization, or surface form.

The International Phonetic Alphabet (IPA) is an alphabetic system ofphonetic notation based primarily on the Latin alphabet. It was devisedby the International Phonetic Association in the late 19th century as astandardized representation of the sounds of spoken language. The IPA isused by lexicographers, foreign language students, and teachers,linguists, speech-language pathologists, singers, actors, constructedlanguage creators, and translators. The IPA is designed to representonly those qualities of speech that are part of oral language: phones,phonemes, intonation and the separation of words, and syllables. Torepresent additional qualities of speech, such as tooth gnashing,lisping, and sounds made with a cleft lip and cleft palate, an extendedset of symbols, the extensions to the International Phonetic Alphabet,may be used. IPA symbols are composed of one or more elements of twobasic types, letters, and diacritics. For example, the sound of theEnglish letter “<t>” may be transcribed in IPA with a single letter,“[t]”, or with a letter plus diacritics, “[

^(h)]”, depending on how precise one wishes to be. Often, slashes areused to signal broad or phonemic transcription; thus, “/t/” is lessspecific than, and could refer to, either “[

^(h)]” or “[t]”, depending on the context and language.

N-gram models are widely used in statistical natural languageprocessing. In speech recognition, phonemes and sequences of phonemesare modeled using a n-gram distribution. For parsing, words are modeledsuch that each n-gram is composed of n words. For languageidentification, sequences of characters/graphemes (e.g., letters of thealphabet) are modeled for different languages. For sequences ofcharacters, the 3-grams (sometimes referred to as “trigrams”) that canbe generated from “good morning” are “goo”, “ood”, “od”, “d m”, “mo”,“mor” and so forth, counting the space character as a gram (sometimesthe beginning and end of a text are modeled explicitly, adding “_g”,“_go”, “ng_”, and “g_”). For sequences of words, the trigrams (shingles)that can be generated from “the dog smelled like a skunk” are “#thedog”, “the dog smelled”, “dog smelled like”, “smelled like a”, “like askunk” and “a skunk #”.

Idiolect is an individual's distinctive and unique use of language,including speech. This unique usage encompasses vocabulary, grammar, andpronunciation. Idiolect is the variety of language unique to anindividual. This differs from a dialect, a common set of linguisticcharacteristics shared among some group of people. The term idiolectrefers to the language of an individual. An isogloss is the geographicboundary of a certain linguistic feature, such as the pronunciation of avowel, the meaning of a word, or the use of some morphological orsyntactic feature. Major dialects are typically demarcated by bundles ofisoglosses, such as the Benrath line that distinguishes High German fromthe other West Germanic languages and the La Spezia-Rimini Line thatdivides the Northern Italian dialects from Central Italian dialects.However, an individual isogloss may or may not have any coincidence witha language border. For example, the front-rounding of “/y/” cuts acrossFrance and Germany, while the “/y/” is absent from Italian and Spanishwords that are cognates with the “/y/”-containing French words.

SUMMARY

Embodiments of the present invention disclose a method, a computerprogram product, and a system for analyzing a sound file, determiningthe language of the sound file and the display, creating a pronunciationmap between the languages, generating a set of pronunciation hints basedon the pronunciation map, and displaying the set of pronunciation hints.The method includes one or more computer processors identifying a wordfrom one or more words in a sound file. The one or more computerprocessors determine a dialect of spoken language for the word. The oneor more computer processors determine a different language to displaythe word. The one or more computer processors retrieve one or morephonological rules based on the determined spoken language of the wordand the determined different language to display the word. The one ormore computer processors create a pronunciation map based on theretrieved phonological rules of the word.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram illustrating a distributed dataprocessing environment, in accordance with an embodiment of the presentinvention;

FIG. 2 is a flowchart depicting operational steps of a pronunciationmapping program, on a server computer within the distributed dataprocessing environment of FIG. 1, for analyzing a sound file,determining the language of the sound file and the display, creating apronunciation map between the languages, generating a set ofpronunciation hints based on the pronunciation map, and displaying theset of pronunciation hints, in accordance with an embodiment of thepresent invention;

FIG. 3 depicts an example of a spoken word converted into a phoneme withits associated images, in accordance with an embodiment of the presentinvention; and

FIG. 4 is a block diagram of components of the server computer executingthe pronunciation mapping program, in accordance with an embodiment ofthe present invention.

DETAILED DESCRIPTION

Current speech to text and text to speech systems commonly mispronouncewords especially when the systems attempt to correctly pronounce namesof individuals. These systems traditionally utilize prebuiltdictionaries to determine pronunciation in a particular dialect.Furthermore, these systems only suggest a limited number of ways ofpronouncing a term irrespective of the phonic rules of the nativelanguage of the term. Traditionally, these systems translate auditoryspeech into a textual dictionary word and then utilize the dictionaryword to generate pronunciation aids, ignoring the phonetics of theoriginal speech language.

Embodiments of the present invention recognize that reliability andefficiency may be gained by creating a pronunciation mapping between thelanguage of the speech and the language of the user, therebycircumventing the need of any user interaction to fix mistakenpronunciations and allowing the user to receive pronunciation hints thatincorporate the phonetics of the pre-translated speech/word. This allowsa user to pronounce a word in the dialect of another language whileutilizing characters from the language of the user. Embodiments of thepresent invention further recognize that efficiency may be gained by theuse of graphics or images to aid in pronunciation in addition tostandard phonetic symbols. Implementation of embodiments of theinvention may take a variety of forms, and exemplary implementationdetails are discussed subsequently with reference to the Figures.

The present invention will now be described in detail with reference tothe Figures. FIG. 1 is a functional block diagram illustrating adistributed data processing environment, generally designated 100, inaccordance with one embodiment of the present invention. The term“distributed” as used in this specification describes a computer systemthat includes multiple, physically, distinct devices that operatetogether as a single computer system. FIG. 1 provides only anillustration of one implementation and does not imply any limitationswith regard to the environments in which different embodiments may beimplemented. Many modifications to the depicted environment may be madeby those skilled in the art without departing from the scope of theinvention as recited by the claims.

Distributed data processing environment 100 includes user computingdevice 104 and server computer 120 interconnected over network 102.Network 102 can be, for example, a telecommunications network, a localarea network (LAN), a wide area network (WAN), such as the Internet, ora combination of the three, and can include wired, wireless, or fiberoptic connections. Network 102 can include one or more wired and/orwireless networks that are capable of receiving and transmitting data,voice, and/or video signals, including multimedia signals that includevoice, data, and video information. In general, network 102 can be anycombination of connections and protocols that will supportcommunications between user computing device 104, server computer 120,and other computing devices (not shown) within distributed dataprocessing environment 100.

User computing device 104 may be a web server or any other electronicdevice or computing system capable of processing program instructionsand receiving and sending data. In some embodiments, user computingdevice 104 may be a laptop computer, a tablet computer, a netbookcomputer, a personal computer (PC), a desktop computer, a personaldigital assistant (PDA), a smart phone, or any programmable electronicdevice capable of communicating with network 102. In other embodiments,user computing device 104 may represent a server computing systemutilizing multiple computers as a server system, such as in a cloudcomputing environment. In general, user computing device 104 isrepresentative of any electronic device or combination of electronicdevices capable of executing machine readable program instructions asdescribed in greater detail with regard to FIG. 4, in accordance withembodiments of the present invention. In the depicted embodiment, usercomputing device 104 includes user interface 106 and display 108.

User interface 106 is a program that provides an interface between auser of user computing device 104 and a plurality of applications thatreside on user computing devices 104 (e.g., telecommunicationapplication (not depicted)) and/or may be accessed over network 102. Auser interface, such as user interface 106, refers to the information(e.g., graphic, text, sound) that a program presents to a user and thecontrol sequences the user employs to control the program. A variety oftypes of user interfaces exist. In one embodiment, user interface 106 isa graphical user interface. A graphical user interface (GUI) is a typeof interface that allows users to interact with peripheral devices(i.e., external computer hardware that provides input and output for acomputing device, such as a keyboard and mouse) through graphical iconsand visual indicators as opposed to text-based interfaces, typed commandlabels, or text navigation. The actions in GUIs are often performedthrough direct manipulation of the graphical elements. In some examples,user interface 106 send and receive information through network 102 toprogram 150.

Display 108 provides an output device for the presentation ofinformation processed by program 150, which may be accessed over network102. A user display, such as display 108, refers to the medium devicethat program 150 presents information on (e.g., graphic, text, sound,haptic). A variety of types of user displays exist. In one embodiment,display 108 is a heads-up display. In another embodiment, display 108 iscombined with user computing device 104. For example, program 150presents the generated pronunciation hints on the display of usercomputing device 104 instead of an external display. In an alternativeembodiment, display 108 is a standalone device that is accessiblethrough network 102. For example, display 108 may be an external monitorcapable of receiving streaming data. In various embodiments, display 108is an haptic device capable of tactile responses (e.g., electronicbraille, etc.).

Server computer 120 can be a standalone computing device, a managementserver, a web server, a mobile computing device, or any other electronicdevice or computing system capable of receiving, sending, and processingdata. In other embodiments, server computer 120 can represent a servercomputing system utilizing multiple computers as a server system, suchas in a cloud computing environment. In another embodiment, servercomputer 120 can be a laptop computer, a tablet computer, a netbookcomputer, a personal computer (PC), a desktop computer, a personaldigital assistant (PDA), a smart phone, or any programmable electronicdevice capable of communicating with user computing device 104 and othercomputing devices (not shown) within distributed data processingenvironment 100 via network 102. In another embodiment, server computer120 represents a computing system utilizing clustered computers andcomponents (e.g., database server computers, application servercomputers, etc.) that act as a single pool of seamless resources whenaccessed within distributed data processing environment 100. In thedepicted embodiment, server computer 120 includes database 122 andprogram 150. In other embodiments, server computer 120 may contain otherapplications, databases, programs, etc. which have not been depicted indistributed data processing environment 100. Server computer 120 mayinclude internal and external hardware components, as depicted anddescribed in further detail with respect to FIG. 4.

Database 122 is a repository for data used by program 150. In thedepicted embodiment, database 122 resides on server computer 120. Inanother embodiment, database 122 may reside on user computing device 104or elsewhere within distributed data processing environment 100 providedprogram 150 has access to database 122. A database is an organizedcollection of data. Database 122 can be implemented with any type ofstorage device capable of storing data and configuration files that canbe accessed and utilized by program 150, such as a database server, ahard disk drive, or a flash memory. In an embodiment, database 122stores data used by program 150, such as historical pronunciationmappings, historical user recognition training, and metadata associatedwith historical users (e.g., languages spoken, user identification,etc.)

Phonological rules 124 is a corpus of phonetic processes for a varietyof known languages. In an embodiment, phonological rules 124 containssets of phonetic rules for each distinct language. In this embodiment,each set may contain series of phonological rules which include, but arenot limited to, assimilation (a sound changes one of its features to bemore similar to an adjacent sound), dissimilation (a sound changes oneof its features to become less similar to an adjacent sound), insertion(an extra sound is added between two others), deletion (a sound, such asa stress-less syllable or a weak consonant, is not pronounced), andmetathesis (two sounds switch places). In various embodiments,phonological rules 124 includes historical pronunciation mappings alongwith associated metadata such as the languages used, contextualmetadata, historical phoneme to phoneme pairs, associated homonyms, andassociated images, etc. In another embodiment, the historicalpronunciation maps are clustered into metrical trees where sentences andwords are split into prosodic constituents (e.g., moras, syllables,feet, phonological words, clitic groups, phonological phrases,intermediate phrases, intonational phrases, phonological utterances,etc.). In yet another embodiment, phonological rules 124 containslinguistic maps which segment languages into hierarchical groups basedon a feature-by-feature comparison of the languages. Typically,languages that derive from a shared ancestor language share commonproperties and are grouped together. In various embodiments,phonological rules 124 contains excerpts of user speech and associateduser pronunciation mappings. In this embodiment, the pronunciationmappings are derived from the speech mannerisms of the user.

Program 150 is a program for analyzing a sound file, determining thelanguage of the sound file and the display, creating a pronunciation mapbetween the languages, generating a set of pronunciation hints based onthe pronunciation map, and displaying the set of pronunciation hints. Inthe depicted embodiment, program 150 is a standalone software program.In another embodiment, the functionality of program 150, or anycombination programs thereof, may be integrated into a single softwareprogram. In some embodiments, program 150 may be located on separatecomputing devices (not depicted) but can still communicate over network102. In some embodiments, program 150 may implement the following steps.Program 150 detects a sound file containing a word, phrase, or sentence.Program 150 analyzes the sound file, generating lexical and structuralmetadata. In an embodiment, program 150 may convert the sound file to atextual format. In another embodiment, program 150 converts the soundfile into the textual phonemes. Program 150 utilizes natural languageprocessing (NLP) techniques to determine the language of the speechcontained in the sound file and the language of the display (e.g.,display 108). Program 150 retrieves the phonetic rules of the determinedlanguages. Program 150 creates a pronunciation mapping based on theretrieved phonetic rules and the generated sound file metadata. Program150 generates textual and imaged based pronunciation hints based on thecreated pronunciation mapping. Program 150 displays the pronunciationhints on the user display. Program 150 is depicted and described infurther detail with respect to FIG. 2.

FIG. 2 is a flowchart depicting operational steps of program 150 foranalyzing a sound file, determining the language of the sound file andthe display, creating a pronunciation map between the languages,generating a set of pronunciation hints based on the pronunciation map,and displaying the set of pronunciation hints, in accordance with anembodiment of the present invention.

Program 150 analyzes the sound file (step 202). In an embodiment, theuser initiates program 150 by inputting a sound file. For example, theuser inputs a previously recorded sound file contains the name “Fang”into program 150. In another embodiment, the user sends a notificationto program 150 to initiate. In yet another embodiment, program 150utilizes a microphone (not depicted) to detect auditory speech. In afurther embodiment, program 150 creates a sound file based on detectedauditory speech. In an alternative embodiment, program 150 creates asound file when the language of the speech is determined to be distinctfrom the primary language of the user. For example, program 150 createsa sound file of a coworker speaking Spanish when the primary language ofthe user is English.

In one embodiment, the user utilizes program 150 to record the soundfile utilizing a microphone (not depicted). For example, the user speaksthe Chinese name “Fang” into the microphone, then program 150 detectsand records the name into a sound file. In another embodiment, program150 retrieves the sound file from a recording application (not depicted)or from a sound file repository (not depicted). In various embodiments,the user records a single word or name into the sound file. Inalternative embodiments, the user records a sentence or phrase into thesound file. For example, the user records and inputs a sound file withthe following phrase “Fang wants to go to the store”. In thisembodiment, program 150 program 150 segments the phrase into itsindividual words and linguistic components.

Responsive to program 150 detecting and retrieving the sound file,program 150 may analyze the sound file utilizing NLP techniques,specifically utilizing speech to text techniques such as speechrecognition and speech segmentation. In an embodiment, program 150utilizes speech recognition to generate a textual representation of thespeech contained in the user sound file. In a further embodiment,program 150 utilizes speech segmentation to separate the user speechinto individual words. In this embodiment, speech segmentation utilizesnatural pauses between successive words to separate and delineate theindividual words. In an alternative embodiment, program 150 utilizesRole and Reference Grammar (RRG) techniques to parse languages that donot contain spaces between words such as Chinese or Japanese.

In yet another embodiment, program 150 utilizes machine learningalgorithms such as Hidden Markov models, neural networks (e.g., deepfeedforward neural networks (DNN), recurrent neural networks, etc.), andend-to-end automatic speech recognition (ASR) to further analyze andparse the user speech. The aforementioned techniques allow program 150to determine the context, grammar, and semantics of the user speechcontained in the sound file. In a further embodiment, the user may trainprogram 150 by speaking or inputting text or isolated vocabulary. Inthis embodiment, program 150 analyzes the voice of the user and adjustsspeech recognition properties and weights resulting in increasedrecognition accuracy. In a further embodiment, program 150 stores orlogs the identity of the user and associated speech metadata in order toallow retrieval of such data in subsequent uses. In various embodiments,program 150 identifies the sound file speaker and retrieves and appliesassociated training results. In an example situation, where the user hasa strong accent, program 150 identifies and adjusts for the speechparticularities of the user.

In an embodiment, program 150 decomposes the speech contained in thesound file into individual phonemes allowing program 150 to distinguishwords and languages. For example, program 150 decomposes the word thumbinto the phoneme “/θΛm/”. In another embodiment, program 150 decomposesa word, phrase, or sentence into smaller lexical segments. For example,program 150 decomposes the term “dogs” into several lexical segmentssuch as “d”, “do”, “dog” and finally “dogs”. In another embodiment,program 150 associates a grammatical rule to each segment, allowingprogram 150 to parse complex word, phrase, or sentence structures. Inyet another embodiment, program 150 creates feature vectors for everyidentified phoneme that include nasal, consonantal, sonorant, toneinformation.

Program 150 determines a language (step 204). Program 150 may determinethe language of the sound file. In an embodiment, the sound fileconsists of one language. In this embodiment, program 150 determines thelanguage of the sound file as whole, all words consist of the samelanguage. In an alternative embodiment, program 150 determines thelanguage for each identified word in the sound file. For example, anEnglish/Spanish speaker creates a sound file that contains phrases thatintermix Spanish and English words. In this example, program 150determines the language of each word rather than the language of thesound file as a whole. In one embodiment, the user inputs the sound filelanguage and/or display language into program 150 via a user interfacesuch as user interface 106. For example, if the language spoken in thesound file is Finnish, the user would designate the sound file languageas Finnish. In another embodiment, program 150 utilizes the analyzedsound file components to determine the language of the sound file. Inthis embodiment, the analyzed sound file may be in a textual form.

Program 150 may determine the language of the display. In an embodiment,program 150 retrieves the display language from the settings or metadatalocated within a user interface (e.g., user interface 106). In a furtherembodiment, program 150 retrieves the display language from the hostoperating system. For example, if the host operating system is operatingin English then program 150 retrieves the language from the operatingsystem and determines that the display language is English.

In an embodiment, if program 150 is unable to retrieve the language ofthe sound file and/or display, then program 150 utilizes text languageidentification techniques (e.g., n-gram, character encoding detection,text compressibility, Naïve Bayes, etc.) to determine the languages. Invarious embodiments, program 150 may utilize direct charactercomparison, where program 150 compares the characters used on thedisplay to a set of known languages to determine the language of thedisplay. In an alternative embodiment, program 150 compares the textcompressibility of the display to the text compressibility of texts inknown languages to determine the language of the display. In an examplesituation, program 150 determines that the display could be multiplelanguages, an issue for lexical and structural similar languages. Inthis example, program 150 prompts the user for verification oradditional input.

Program 150 creates the pronunciation map (step 206). Program 150utilizes the determined sound file and display languages (step 204) toretrieve associated phonological rules from a repository such asphonological rules 124. The associated phonological rules may compriseof language specific phonetic rules in addition to the followingprosodic rules: syllable, onset and rime, articulatory gestures,articulatory features, mora, etc. In an alternative embodiment, program150 retrieves equivalent phonological rules for sign languages thatinclude, but are not limited to, movement, location, and handshaperules.

In various embodiments, program 150 analyzes the determined sound fileand display languages for lexical similarity; how similar are the wordsets for two languages. In an embodiment, program 150 calculates alinguistic distance which is a measurement of the ability of speakers ofone language to understand the other language. For example, the Frenchand Spanish languages have a low linguistic distance demonstrating thenumerous similarities of the two languages. In another embodiment,program 150 retrieves linguistic maps from phonological rules 124 tocalculate the similarity of the languages. For example, if two languagesshare a common language ancestor, it can be assumed that the languagesshare common properties and phonetic rules. In yet another embodiment,program 150 utilizes lexicostatistics to compare two languages. In thisembodiment, program 150 compares the percentage of lexical cognates(words that have a common origin) between the languages to determine therelationship of the languages and similarities. For example, if twolanguages share a high percentage of lexical cognates, then program 150calculates a low distance score between the languages, signifying a highlevel of language similarity.

Responsive to calculating the similarity of the determined languages,program 150 creates a pronunciation mapping between the language of theterms in the sound file and the display language. In variousembodiments, program 150 utilizes an alphabet/character map, especiallyuseful with similar languages, between the determined sound filelanguage and the determined display language. For example, if the soundfile language is in Chinese and the display language is English, thenprogram 150 retrieves the alphabets of both languages and converts theChinese characters into English equivalents. In an example situation,program 150 analyzes the sound file and determines that the sound fileincludes the Chinese name “Zhong”. In this situation, program 150utilizes alphabet mapping to convert “Zhong” into the English equivalent“Gong”, where the Chinese “Zh” is mapped to the English “G”.

In numerous embodiments, program 150 applies the retrieved phoneticrules to the textual terms analyzed and determined within the soundfile. In this embodiment, program 150 converts and segments said termsinto phonemes within its respective language. For example, the Russianname “

BaH” (Ivan) is translated into the display language and segmented intothe phoneme “[ee-van]”. In yet another embodiment, program 150 segmentsand converts said terms into the International Phonetic Alphabet (IPA).Continuing from the previous example, the Russian name “

BaH” (Ivan) is converted into the IPA phoneme “[I.'van]”. In a furtherembodiment, the IPA phoneme is converted into a phoneme in the displaylanguage. For example, the IPA phoneme “[I.'van]” is converted to anEnglish phoneme such as “[eye-van]”. In yet another embodiment, program150 utilizes the rules retrieved from phonological rules 124 to convertthe phoneme created from the terms in the sound file into a phonemebased on the display language. In various embodiments, program 150creates phonemes based on the speech on the sound file rather than thetextual and dictionary conversion of the speech. In further embodiment,program 150 identifies idiolectic features of the created phonemes. Inthis embodiment, extracts phonological features (i.e., phonetics,syntax, semantics, morphology, etc.) that are unique to the speaker. Inyet another embodiment, program 150 identifies isoglos sic features ofthe phonemes words. In this embodiment, program 150 extractsphonological features that are unique to a geographic area. In a furtherembodiment, program 150 retrieves rules from phonological rules 124 thatcorrespond with the geographic area identified by the extracted isoglossic features.

For example, a Spanish speaker pronounces the name “Jose” as “[t∫o

'zeI]” or “[joe-say]” instead of the traditional pronunciation “[ho

'zeI]” or “[ho-zay]”. In this example, program 150 converts the phonemeof the speaker into either the IPA or display language phonemeequivalent based on the particular speech dialect, accent, and vocalpatterns of the speaker.

Program 150 generates pronunciation hints (step 208). In an embodiment,program 150 generates textual pronunciation hints based on thedetermined phonemes from step 206. For example, if program 150 generatesthe phoneme “[I.'van]” for the name Ivan, then program 150 generates apronunciation hint which includes the phoneme. In an embodiment, program150 segments the phoneme into its respective syllables. In anotherembodiment, program 150 incorporates phonological rules 124 toincorporate information regarding tone, stress, context, etc. In anotherembodiment, program 150 generates pronunciation hints utilizing theconverted IPA phonemes. In an alternative embodiment, program 150generates the pronunciation hints utilizing the display languagephonemes.

In an embodiment, program 150 identifies additional terms that may rhymewith determined phonemes. In a further embodiment, program 150identifies the sounds represented by the phoneme and identifies similarterms that have a similar phonetic structure. In this embodiment, therhyming term may rhyme with an individual syllable in the phoneme or therhyming term may rhyme with phoneme as a whole. For example, if program150 decomposes the name “Jose” into the English phoneme “[ho-zay]”, thenprogram 150 identifies that the term “so” is phonetically similar withthe syllable “ho” and the term “say” is phonetically similar with thesyllable “zay”. In this example, program 150 identifies and generatesthe rhyming terms to assist the user in pronouncing the producedphoneme.

In various embodiments, program 150 identifies terms and correspondingimages that are heteronyms of the identified terms within the soundfile. In an embodiment, program 150 identifies the sounds used withinthe sound file phonemes and identifies terms that share the exact orhighly similar phonetic structure. Responsive to identifying said terms,program 150 may retrieve images that represent the terms. For example,if program 150 decomposes the term “Ivan” into the phoneme “[I.'van]”,then program 150 identifies terms that are associated with this phoneme.In this example, program 150 segments the phoneme into its respectivesyllables, “[I.]” and “['van]”, identifies that the terms “eye” and“van” share the same phonetic structure as the segmented phoneme, andthen retrieves images associated with the terms “eye” and “van”, asdepicted in FIG. 3.

In other embodiments, program 150 adjusts the generated phonemes andpronunciation hints based on the speech of the user rather thantraditional pronunciation maps. In this embodiment, program 150 utilizesthe mannerisms, particularities and/or deficiencies of the speech of theuser to create user specific pronunciation maps. These pronunciationsmaps map the manner the user speaks with how the sound would betraditionally spoken. For example, if the user has a speech impedimentwhich prevents the user from speaking “[

]” (r) sounds, then program 150 adjusts the generated pronunciationhints to either replace “[

]” sounds with a similar phoneme or destress the “[

]” sound.

Program 150 displays pronunciation hints (step 210). In someembodiments, in response to creating a pronunciation map and generatingpronunciation hints, program 150 displays the generated pronunciationhints, via display 108. In various embodiments, program 150 determinesthe capabilities of the display (e.g., display 108). Responsive to thecapabilities of the display, program 150 may adjust the amount ofinformation presented to the user. For example, if program 150determines that the display has a low resolution, incapable ofdisplaying images clearly, then program 150 may present thepronunciation hint in a textual form such as presenting the phonemeand/or any associated rhyming words without any additional images.

In one embodiment, program 150 displays the pronunciation hints on usercomputing device (e.g., user computing device 104). In variousembodiments, program 150 may determine that the user computing device(e.g., user computing device 104) does not have a display surface andtherefore, program 150 may send auditory based pronunciation hints tothe user computing device. In an example situation where the user isblind, program 150 converts the textual pronunciation hints to auditorypronunciation hints. In this example situation, program 150 utilizestext to speech techniques to provide auditory pronunciation hints. Inother embodiments, program 150 may provide haptic feedback instead ofdisplaying the hints on a visual display. In a further embodiment,program 150 provides haptic feedback through an electronic braillesystem. In this embodiment, program 150 maps the phonemes to the brailleequivalents. In this embodiment, program 150 provides pronunciationhints through haptic feedback in addition to the visual pronunciationhints. In some embodiments, program 150 may not control a device capableof interfacing with a user, but rather program 150 may send instructionsto the device, which in turn displays or otherwise interacts with a userbased on the received instructions.

FIG. 3 depicts sample pronunciation map 300, which is an exampleillustration of program 150 generating a pronunciation hint with itsassociated images as demonstrated in step 206 and 208. Samplepronunciation map 300 includes spoken word 302, phoneme 304, andpronunciation hint 306. Pronunciation hint 306 includes the segmentedphoneme along with its associated images.

In a detailed example of flowchart 200, an English user inputs a soundfile containing the name “

BaH” (“Ivan”) spoken by a Russian speaker into program 150. Upondetection of the sound file, program 150 analyzes the speech containedin the sound file and extracts phonological metadata, as discussed instep 202. After analyzing the sound file, program 150 determines thelanguages of the display and of the speech contained in the sound fileas depicted in spoken word 302, as discussed in step 204. Based on thedetermined languages, program 150 creates a pronunciation map betweenthe languages, as discussed in step 206 and depicted in phoneme 304.Responsive to program 150 creating the pronunciation map, program 150generates pronunciation hints and retrieves associated images, asdiscussed in step 208 and depicted in pronunciation hint 306. Program150 then displays the generated pronunciation hints on a user computingdevice, as discussed in step 210.

FIG. 4 depicts a block diagram 400 of components of server computer 120in accordance with an illustrative embodiment of the present invention.It should be appreciated that FIG. 4 provides only an illustration ofone implementation and does not imply any limitations with regard to theenvironments in which different embodiments may be implemented. Manymodifications to the depicted environment may be made.

Server computer 120 includes communications fabric 404, which providescommunications between cache 403, memory 402, persistent storage 405,communications unit 407, and input/output (I/O) interface(s) 406.Communications fabric 404 can be implemented with any architecturedesigned for passing data and/or control information between processors(such as microprocessors, communications and network processors, etc.),system memory, peripheral devices, and any other hardware componentswithin a system. For example, communications fabric 404 can beimplemented with one or more buses or a crossbar switch.

Memory 402 and persistent storage 405 are computer readable storagemedia. In this embodiment, memory 402 includes random access memory(RAM). In general, memory 402 can include any suitable volatile ornon-volatile computer readable storage media. Cache 403 is a fast memorythat enhances the performance of computer processor(s) 401 by holdingrecently accessed data, and data near accessed data, from memory 402.

Program 150 may be stored in persistent storage 405 and in memory 402for execution by one or more of the respective computer processor(s) 401via cache 403. In an embodiment, persistent storage 405 includes amagnetic hard disk drive. Alternatively, or in addition to a magnetichard disk drive, persistent storage 405 can include a solid-state harddrive, a semiconductor storage device, a read-only memory (ROM), anerasable programmable read-only memory (EPROM), a flash memory, or anyother computer readable storage media that is capable of storing programinstructions or digital information.

The media used by persistent storage 405 may also be removable. Forexample, a removable hard drive may be used for persistent storage 405.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer readable storage medium that is also part of persistent storage405.

Communications unit 407, in these examples, provides for communicationswith other data processing systems or devices. In these examples,communications unit 407 includes one or more network interface cards.Communications unit 407 may provide communications through the use ofeither or both physical and wireless communications links. Program 150may be downloaded to persistent storage 405 through communications unit407.

I/O interface(s) 406 allows for input and output of data with otherdevices that may be connected to server computer 120. For example, I/Ointerface(s) 406 may provide a connection to external device(s) 408,such as a keyboard, a keypad, a touch screen, and/or some other suitableinput device. External devices 408 can also include portable computerreadable storage media such as, for example, thumb drives, portableoptical or magnetic disks, and memory cards. Software and data used topractice embodiments of the present invention, e.g., program 150, can bestored on such portable computer readable storage media and can beloaded onto persistent storage 405 via I/O interface(s) 406. I/Ointerface(s) 406 also connect to a display 409.

Display 409 provides a mechanism to display data to a user and may be,for example, a computer monitor.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration but are not intended tobe exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the invention.The terminology used herein was chosen to best explain the principles ofthe embodiment, the practical application or technical improvement overtechnologies found in the marketplace, or to enable others of ordinaryskill in the art to understand the embodiments disclosed herein.

What is claimed is:
 1. A method comprising: retrieving, by one or morecomputer processors, one or more phonological rules based on adetermined spoken language of a word, a determined different language todisplay the word, and a calculated linguistic distance based on thedetermined spoken language and the determined different language todisplay; and creating, by one or more computer processors, apronunciation map based on the retrieved phonological rules of the word.2. The method of claim 1, wherein retrieving one or more phonologicalrules based on the determined spoken language of the word and thedetermined language to display the word further comprises: identifying,by one or more computer processors, one or more idiolectic features ofthe word; identifying, by one or more computer processors, one or moreisoglossic features of the word; decomposing, by one or more computerprocessors, the word into phonemes based on the one or more idiolecticand isoglossic features of the word; and retrieving, by one or morecomputer processors, additional phonological rules based on thedecomposed phonemes.
 3. The method of claim 1, wherein creating, apronunciation map based on the retrieved phonological rules of the wordfurther comprises: decomposing, by one or more computer processors, theword into segmented phonemes based on the dialect of the spoken word;identifying, by one or more computer processors, related phonemes in thephonological rules of the language of the display; and mapping, by oneor more computer processors, the decomposed phonemes with the identifiedrelated phonemes.
 4. The method of claim 1, retrieving one or morephonological rules based on the determined spoken language of the wordand the determined language to display the word further comprises,retrieving, by one or more computer processors, prosodic rules whichinclude one or more of the following rules: syllable, onset and rime,articulatory gestures, articulatory features, and mora.
 5. The method ofclaim 1 further comprising, generating, by one or more computerprocessors, one or more pronunciation hints based on the createdpronunciation map.
 6. The method of claim 5, wherein generating one ormore pronunciation hints based on the created pronunciation map furthercomprises, displaying, by one or more computer processors, thepronunciation hints on a display.
 7. The method of claim 5, whereingenerating one or more pronunciation hints based on the createdpronunciation map further comprises displaying, by one or more computerprocessors, on an electronic braille display.
 8. The method of claim 5,wherein generating pronunciation hints based on the createdpronunciation map further comprises, decomposing, by one or morecomputer processors, the one or more pronunciation hints into phonemes.9. The method of claim 5, wherein generating pronunciation hints basedon the created pronunciation map further comprises: identifying, by oneor more computer processors, one or more images related to thedecomposed phonemes; retrieving, by one or more computer processors, theone or more images related to the decomposed phonemes; and displaying,by one or more computer processors, the decomposed phonemes and theretrieved one or more images.
 10. A computer program product comprising:one or more computer readable storage media and program instructionsstored on the one or more computer readable storage media, the programinstructions comprising: program instructions to retrieve one or morephonological rules based on a determined spoken language of a word, adetermined different language to display the word, and a calculatedlinguistic distance based on the determined spoken language and thedetermined different language to display; and program instructions tocreate a pronunciation map based on the retrieved phonological rules ofthe word.
 11. The computer program product of claim 10, wherein theprogram instructions to retrieve one or more phonological rules based onthe determined spoken language of the word and the determined languageto display the word further comprise program instructions to: identifyone or more idiolectic features of the word; identify one or moreisoglossic features of the word; decompose the word into phonemes basedon the one or more idiolectic and isoglossic features of the word; andretrieve additional phonological rules based on the decomposed phonemes.12. The computer program product of claim 10, wherein the programinstructions to create a pronunciation map based on the retrievedphonological rules of the word further comprise program instructions to:decompose the word into segmented phonemes based on the dialect of thespoken word; identify related phonemes in the phonological rules of thelanguage of the display; and map the decomposed phonemes with theidentified related phonemes.
 13. The computer program product of claim10, further comprising program instructions, stored on the one or morecomputer readable storage media, to: generate one or more pronunciationhints based on the created pronunciation map.
 14. The computer programproduct of claim 13, wherein the program instructions to generatepronunciation hints based on the created pronunciation map furthercomprise program instructions to: decompose the one or morepronunciation hints into phonemes.
 15. The computer program product ofclaim 13, wherein the program instructions to generate pronunciationhints based on the created pronunciation map further comprise programinstructions to: identify one or more images related to the decomposedphonemes; retrieve the one or more images related to the decomposedphonemes; and display the decomposed phonemes and the retrieved one ormore images.
 16. A computer system comprising: one or more computerprocessors; one or more computer readable storage media; and programinstructions stored on the computer readable storage media for executionby at least one of the one or more processors, the program instructionscomprising: program instructions to retrieve one or more phonologicalrules based on a determined spoken language of a word, a determineddifferent language to display the word, and a calculated linguisticdistance based on the determined spoken language and the determineddifferent language to display; and program instructions to create apronunciation map based on the retrieved phonological rules of the word.17. The computer system product of claim 16, wherein the programinstructions to retrieve one or more phonological rules based on thedetermined spoken language of the word and the determined language todisplay the word further comprise program instructions to: identify oneor more idiolectic features of the word; identify one or more isoglossicfeatures of the word; decompose the word into phonemes based on the oneor more idiolectic and isoglossic features of the word; and retrieveadditional phonological rules based on the decomposed phonemes.
 18. Thecomputer system product of claim 16, wherein the program instructions tocreate a pronunciation map based on the retrieved phonological rules ofthe word further comprise program instructions to: decompose the wordinto segmented phonemes based on the dialect of the spoken word;identify related phonemes in the phonological rules of the language ofthe display; and map the decomposed phonemes with the identified relatedphonemes.
 19. The computer system product of claim 16, furthercomprising program instructions, stored on the computer readable storagemedia for execution by at least one of the one or more processors, to:generate one or more pronunciation hints based on the createdpronunciation map.
 20. The computer system product of claim 19, whereinthe program instructions to generate pronunciation hints based on thecreated pronunciation map further comprise program instructions to:identify one or more images related to the decomposed phonemes; retrievethe one or more images related to the decomposed phonemes; and displaythe decomposed phonemes and the retrieved one or more images.